A Hybrid Extraction Model for Chinese Noun/Verb Synonym bi-gram Collocations
نویسندگان
چکیده
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
منابع مشابه
A Hybrid Extraction Model for Chinese Noun/Verb Synonymous bi-gram Collocations
Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a s...
متن کاملUsing Synonym Relations in Chinese Collocation Extraction
A challenging task in Chinese collocation extraction is to improve both the precision and recall rate. Most lexical statistical methods including Xtract face the problem of unable to extract collocations with lower frequencies than a given threshold. This paper presents a method where HowNet is used to find synonyms using a similarity function. Based on such synonym information, we have success...
متن کاملTCtract-A Collocation Extraction Approach for Noun Phrases Using Shallow Parsing Rules and Statistic Models
This paper presents a hybrid method for extracting Chinese noun phrase collocations that combines a statistical model with rule-based linguistic knowledge. The algorithm first extracts all the noun phrase collocations from a shallow parsed corpus by using syntactic knowledge in the form of phrase rules. It then removes pseudo collocations by using a set of statistic-based association measures (...
متن کاملCollocation Extraction: Needs, Feeds And Results Of An Extraction System For German
This paper provides a specification of requirements for collocation extraction systems, taking as an example the extraction of noun + verb collocations from German texts. A hybrid approach to the extraction of habitual collocations and idioms is presented, aiming at a detailed description of collocations and their morphosyntax for natural language generation systems as well as to support learne...
متن کاملEvaluation of Google and Bing online translation of verb-noun collocations from English into Arabic
This article aims to investigate and evaluate the translation of verb-noun collocation in English into Arabic Google and Bing online translation engines. A number of sentences were used as a testing dataset to evaluate both engines. Human translations by three bilingual speakers were used as a gold standard. A simple evaluation metric was proposed to calculate the translation accuracy of verb-n...
متن کامل